Skip to content

Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes (v2)#6008

Open
virtuald wants to merge 6 commits intopybind:masterfrom
virtuald:enum-mismatch-2
Open

Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes (v2)#6008
virtuald wants to merge 6 commits intopybind:masterfrom
virtuald:enum-mismatch-2

Conversation

@virtuald
Copy link
Contributor

@virtuald virtuald commented Mar 17, 2026

Description

Alternative fix for #6003, primarily made it because I'm not sure this is better and to check that all tests pass. See discussion over on that PR for background and analysis.

Suggested changelog entry:

Fix crash in def_readwrite for non-smart-holder properties of smart-holder classes

- Occurs with non-smart-holder property of smart-holder class
@rwgk
Copy link
Collaborator

rwgk commented Mar 22, 2026

@virtuald @oremanj I think this PR is definitely the way to go (suggestion to close PR #6003 in favor of this PR).

Below is a Opus 4.6 1M Thinking analysis of the fix.

I think we should merge this fix as-is, because it's a strict improvement, and maybe do other things in follow-on PRs.

For this PR: I still need to look carefully at the tests.


The bug

When a smart-holder class (py::classh<T>) has a by-value member whose type uses a non-shared_ptr holder (e.g., an enum bound via py::enum_<E>, which uses unique_ptr<E>), def_readwrite creates an aliasing shared_ptr<E> pointing into the parent object. The shared_ptr-to-Python cast path then calls cast_holder(srcs, &src), which tries to stuff the shared_ptr into the target type's holder. For unique_ptr-held types, this is UB — it reinterprets shared_ptr memory as unique_ptr, leading to invalid free / double-free on deallocation.

The fix

In copyable_holder_caster<type, shared_ptr<type>, ...>::cast():

static handle
cast(const std::shared_ptr<type> &src, return_value_policy policy, handle parent) {
    const auto *ptr = src.get();
    typename type_caster_base<type>::cast_sources srcs{ptr};
    if (srcs.creates_smart_holder()) {
        return smart_holder_type_caster_support::smart_holder_from_shared_ptr(
            src, policy, parent, srcs.result);
    }

    auto *tinfo = srcs.result.tinfo;
    if (tinfo != nullptr && tinfo->holder_enum_v == holder_enum_t::std_shared_ptr) {
        return type_caster_base<type>::cast_holder(srcs, &src);
    }

    if (parent) {
        return type_caster_base<type>::cast(
            srcs, return_value_policy::reference_internal, parent);
    }

    throw cast_error("Unable to convert std::shared_ptr<T> to Python when the bound type "
                     "does not use std::shared_ptr or py::smart_holder as its holder type");
}

Three cases:

  1. smart_holder target: handled first via smart_holder_from_shared_ptr (unchanged).
  2. std::shared_ptr holder target: proceeds to cast_holder (the original path, now gated).
  3. Any other holder (e.g., unique_ptr for enums): falls back to reference_internal if parent is available, otherwise throws.

Why the reference_internal fallback is safe

Traced through type_caster_generic::cast() with policy = reference_internal:

  1. valueptr = src (raw pointer to the member inside the parent object).
  2. wrapper->owned = false — pybind11 does NOT own this memory.
  3. keep_alive_impl(inst, parent) — returned Python object holds a reference to parent, keeping the parent (and thus the member) alive.
  4. tinfo->init_instance(wrapper, existing_holder) is called with existing_holder = nullptr.

For unique_ptr-held types, init_holder checks:

} else if (detail::always_construct_holder<holder_type>::value || inst->owned) {
    new (std::addressof(v_h.holder<holder_type>())) holder_type(v_h.value_ptr<type>());
    v_h.set_holder_constructed();
}

Since owned = false and always_construct_holder<unique_ptr<T>> defaults to false, the condition is false || false — the holder is never constructed. No unique_ptr wraps the pointer, so no double-free can occur.

On deallocation, pybind11 checks holder_constructed() (which is false), skips holder destruction, and since owned = false, doesn't call delete on the raw pointer. The raw pointer remains valid because keep_alive_impl guarantees the parent outlives the returned object.

The shared_ptr temporary lifetime is not a concern

The aliasing shared_ptr<D> returned by the def_readwrite lambda is a temporary. After cast() returns, it is destroyed. But the fallback path uses only the raw pointer (via srcs) and reference_internal to parent, so it does not depend on the shared_ptr's lifetime at all. The member's lifetime is managed by the parent Python object.

Generality vs. PR #6003

PR #6003 fixes the problem at the def_readwrite call site only — it checks D's holder type at bind time and skips the aliasing shared_ptr getter when incompatible.

PR #6008 fixes the problem at the shared_ptr-to-Python cast layer, catching all cases where a shared_ptr<T> is converted to Python for a type whose holder isn't shared_ptr or smart_holder. This includes:

  • def_readwrite (the reported bug)
  • Any user function that returns shared_ptr<T> where T has a unique_ptr holder
  • Any other code path that creates a shared_ptr to a non-shared_ptr-holder type

This is exactly what @oremanj suggested:

Can we instead modify the shared_ptr to-Python cast path so that it doesn't blindly assume it's casting to a type with a compatible holder?

Notes for discussion

  1. Hardcoded reference_internal in the fallback: The fallback always uses reference_internal regardless of the originally requested policy. For def_readwrite this is correct (policy is already reference_internal). For other callers with different policies, this is a silent override. However, the old behavior was UB, so any defined behavior is an improvement.

  2. The throw case (no parent): If someone returns shared_ptr<EnumType> from a function without a parent object, the fix throws a clear cast_error. Previously this was UB. The user would need to change their code (e.g., return by value instead of shared_ptr, or change the holder type).

  3. Enum semantics: For enum members accessed via def_readwrite, the returned Python object is a new instance wrapper around the raw pointer, not one of the pre-registered enum singletons. However, since py::enum_ inherits from Python's IntEnum, equality comparison (==) is by value, so obj.level == TinyLevel.A works correctly.


if (parent) {
return type_caster_base<type>::cast(
srcs, return_value_policy::reference_internal, parent);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With reference to my previous comment: I'm having second thoughts.

It'd be best to make this right in this PR, otherwise we'll risk having a subtle behavior change later. We should inspect policy to decide if we actually need to override it. E.g. I'd expect that we don't need to override return_value_policy::copy.

Copy link
Contributor Author

@virtuald virtuald Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the original policy doesn't break the tests locally (whereas the absence of the fixes causes the tests to crash) so this seems fine assuming the tests pass in CI.

Copy link
Contributor Author

@virtuald virtuald Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt-5.4 found a counterexample that fails if the policy isn't changed:

def test_non_smart_holder_member_type_with_smart_holder_owner_aliases_member():
    obj = m.ShWithSimpleStructMember()
    legacy = obj.legacy
    legacy.value = 13
    assert obj.legacy.value == 13

... well, actually, that's surprising to me? The documentation says that def_readwrite should use reference_internal by default.

Copy link
Collaborator

@rwgk rwgk Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what Cursor Opus 4.6 1M Thinking is saying (it seems believable to me; tracing through manually would take a good chunk of time):


The counterexample (found by Dustin via gpt-5.4)

def test_non_smart_holder_member_type_with_smart_holder_owner_aliases_member():
    obj = m.ShWithSimpleStructMember()
    legacy = obj.legacy
    legacy.value = 13
    assert obj.legacy.value == 13  # FAILS with policy pass-through

This test expects obj.legacy to return a reference into the parent object
(so mutating the returned object mutates the member). With commit b299f32
("Use default policy"), the test fails because a copy is returned instead.

Root cause: return_value_policy_override

The full call chain for def_readwrite with a smart_holder parent and a
unique_ptr-held member type:

  1. def_readwrite specifies return_value_policy::reference_internal
    (pybind11.h line 2371).

  2. The smart_holder getter (property_cpp_function_sh_member_held_by_value::read())
    returns std::shared_ptr<D> by value — an aliasing shared_ptr that
    points into the parent object.

  3. At call time, pybind11's dispatch applies:

    return_value_policy policy
        = return_value_policy_override<Return>::policy(call.func.policy);

    (pybind11.h line 414)

  4. return_value_policy_override has a specialization for types whose caster
    inherits from type_caster_generic (cast.h lines 1632–1641):

    static return_value_policy policy(return_value_policy p) {
        return !std::is_lvalue_reference<Return>::value
                   && !std::is_pointer<Return>::value
               ? return_value_policy::move
               : p;
    }

    Since std::shared_ptr<D> (returned by value) is neither an lvalue reference
    nor a pointer, the policy is unconditionally overridden to move.

  5. So by the time copyable_holder_caster::cast() is entered, policy is
    move, not reference_internal. The original intent is lost.

What happens with each variant of the fallback

Hardcoded reference_internal (original PR #6008 code)

if (parent) {
    return type_caster_base<type>::cast(
        srcs, return_value_policy::reference_internal, parent);
}

In type_caster_generic::cast(), the reference_internal case
(type_caster_base.h lines 1091–1094):

case return_value_policy::reference_internal:
    valueptr = src;          // raw pointer into the parent
    wrapper->owned = false;
    keep_alive_impl(inst, parent);
    break;

Result: the returned Python object is a reference to the member inside the
parent. Mutations are visible through the parent. The parent is kept alive.
Correct def_readwrite semantics.

Pass-through policy (commit b299f32)

if (parent) {
    return type_caster_base<type>::cast(srcs, policy, parent);
}

policy is move (due to the override). In type_caster_generic::cast(),
the move case (type_caster_base.h lines 1070–1088):

case return_value_policy::move:
    if (move_constructor) {
        valueptr = move_constructor(src);
    } else if (copy_constructor) {
        valueptr = copy_constructor(src);
    }
    // ...
    wrapper->owned = true;
    break;

Result: a new independent copy is created. owned = true means the holder
is constructed and will destroy it. Mutations to the copy are invisible to the
parent. Broken def_readwrite semantics.

Why this concern about silently overriding copy is moot

The worry was: "We should inspect policy to decide if we actually need to
override it. E.g. I'd expect that we don't need to override
return_value_policy::copy."

This concern doesn't apply because return_value_policy_override has already
stomped whatever policy was originally specified to move, unconditionally, for
any by-value shared_ptr return. The policy variable arriving at
copyable_holder_caster::cast() is always move for this code path. There is
no copy to preserve.

The only way policy could be something other than move here is if the
function's declared return type were const std::shared_ptr<D>& (an lvalue
reference), in which case the override would pass through the original policy.
But the smart_holder def_readwrite getter always returns shared_ptr<D> by
value.

Conclusion

Commit b299f32 should be reverted. The fallback path must hardcode
reference_internal:

if (parent) {
    return type_caster_base<type>::cast(
        srcs, return_value_policy::reference_internal, parent);
}

This is correct because:

  • For def_readwrite (the primary use case): preserves reference semantics
    despite the policy override stomping reference_internalmove.
  • For any other caller that reaches this fallback with a parent: the old
    behavior was UB (stuffing a shared_ptr into a unique_ptr holder), so
    reference_internal is a strict improvement regardless.
  • The throw path (no parent) is unchanged and catches the truly unsupported
    case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized, when Cursor formatted my previous comment, it omitted the "pre-existing design tension" phrase it showed me in the original response; I'm dumping it here raw:

  Bottom line: The hardcoded reference_internal in the fallback-with-parent case was actually correct. By the time the policy reaches copyable_holder_caster::cast(), it has already been
  stomped to move by return_value_policy_override. Passing it through destroys the reference_internal semantics that def_readwrite intended.
  Dustin's surprise is justified -- def_readwrite does specify reference_internal, but the override silently changes it. This is a pre-existing design tension: return_value_policy_override
  is designed to force move for by-value returns of registered types (which makes sense for normal function returns), but it's at odds with the smart_holder def_readwrite getter which
  returns an aliasing shared_ptr by value specifically as a mechanism to carry a reference.
  So b299f32 should be reverted (go back to hardcoding reference_internal in the fallback). Your original [T002] concern about silently overriding copy is moot because the policy arriving
  here is always move (the override already stomped whatever was originally specified).

This reinforces what I was thinking for a long time, hand wavy: the existing return value policy manipulations are tricky. The conclusion is the same: hardcoding reference_internal is probably the best we can do.

@virtuald
Copy link
Contributor Author

The throw case (no parent): If someone returns shared_ptr from a function without a parent object, the fix throws a clear cast_error. Previously this was UB. The user would need to change their code (e.g., return by value instead of shared_ptr, or change the holder type).

This actually is checked at compile time:

m.def("getSharedEnumAB", []() -> std::shared_ptr<EnumAB> {
    return std::make_shared<EnumAB>();
});
/work/pybind11/include/pybind11/cast.h:973:61: error: static assertion failed: Holder classes are only supported for custom types
  973 |     static_assert(std::is_base_of<base, type_caster<type>>::value,

I did add a test for the "returning shared ptr when holder is unique_ptr" case though.

RuntimeError,
match="Unable to convert std::shared_ptr<T> to Python when the bound type does not use std::shared_ptr or py::smart_holder as its holder type",
):
m.getSimpleStructAsShared()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that this check sometimes doesn't occur on some compilers? Maybe we just omit this test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight, it seems like -DPYBIND11_TEST_SMART_HOLDER=ON could be the culprit?

But hang on a sec, I'll post another comment. Maybe that one will make this moot (not sure).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIC you find it's not moot, and JIC it helps: PYBIND11_TEST_SMART_HOLDER (cmake) triggers PYBIND11_RUN_TESTING_WITH_SMART_HOLDER_AS_DEFAULT_BUT_NEVER_USE_IN_PRODUCTION_PLEASE (C++)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants